Import libraries

The main used libraries include sklearn, matplotlib, pandas, numpy and lime.

Data Ingestion & Preparation

Data collection

This step concatenates original test dataset and training dataset.

Data preparation

This steps shows kinds of value of each feature.

Missing values are removed,and duplicated rows are removed

Visuallizing the disturibution of value of feature-Human weight

Removing outliers

Feature selection is done in this step that Education-num is deleted for the repetition.

Data transform is taken in this step. Both categorical and neumerical features are transformed.

Data split is done in this step with a ratio of 8:2 randomly

Model Building

Grid reseach is done in this step to find the best hyper-parameters for neural networks. The best parameter is "hidden_layer_sizes=(15,), random_state=3,max_iter=500, activation='relu',alpha =0.001,solver='sgd'"

This is the model performance for the classifer f.

Building feature-based explainer

Building explainer

Generating explaination example

Feature importance calculation and feature selection with XAI

Generating feature-based explanations for all instances and aggregating them to get global feature importance.

Drawing figure for feature importance, including error, average importance and frequency of top five feaure importance at each local explanation.It is apparent that the features of weight and hours per week are unimportant.

Based on the figure above, two features are removed(Weight and hours per week) and then data is updated.

The updated data is used to retrain the model(classifier fa). The performance is measured by Macro-F1 and Accuracy.

Data-based explanation

Data preparation

30000 data instances are sampled from the remaining data, and the label of 15000 instances are changed randomly, that is corrupted.

Building data-based explainer

The result shows that corrupted groups are assigned negative value, whereas useful data groups are assigned high value

Further validation by retraining model with adverse data and valuable data

Calculating average and errors

Drawing picture